Tag

#AI inference

20 articles

OpenAI reveals its first AI processor: Jalapeño

OpenAI has unveiled its first custom AI processor, Jalapeño, developed in partnership with Broadcom. The chip is designed for AI inference tasks and marks a move toward vertical integration in the AI industry.

Jun 2440

AI inference startup Baseten reportedly raising $1.5B months after its last mega-round

Learn how to create and run a simple AI inference example, understanding the core concepts behind AI model deployment that companies like Baseten are building upon.

Jun 1830

tech

Xiaomi MiMo and TileRT Push a 1-Trillion-Parameter Model Past 1000 Tokens Per Second on Commodity GPUs

Xiaomi's MiMo team, with TileRT, has achieved over 1000 tokens per second on a 1-trillion-parameter model using a single 8-GPU commodity node, marking a significant leap in LLM inference performance.

Jun 840

Perplexity AI Introduces Hybrid Local-Server Inference Orchestrator for Personal Computer: Automatic On-Device and Cloud Task Routing

Perplexity AI introduces a hybrid local-server inference orchestrator that automatically routes AI tasks between on-device and cloud models, enhancing both performance and privacy.

Jun 545

NVIDIA AI Releases Dynamo Snapshot: A CRIU-Based Fast Startup System for AI Inference on Kubernetes

NVIDIA introduces Dynamo Snapshot, a CRIU-based system that accelerates AI inference on Kubernetes by enabling fast startup and restoration of vLLM workers.

Jun 533

Perplexity built an “air-traffic controller” that decides in real time whether your AI query runs on your PC or in the cloud

Perplexity AI has introduced an intelligent system that dynamically splits AI workloads between local PCs and cloud servers, optimizing performance and cost.

Jun 251

tech

After Nvidia’s $20B not-acqui-hire, AI chip startup Groq reportedly raising $650M

AI chip startup Groq is raising $650 million in internal funding as it pivots from hardware to focus more on AI inference, the process of refining how AI models respond to user prompts.

May 2946

tech

Google launches a tiny board that runs Gemma 3 locally

Google introduces the Coral Board, a compact single-board computer that runs Gemma 3 locally, enabling on-device AI inference for edge computing applications.

May 2844

Together AI Open-Sources OSCAR: An Attention-Aware 2-Bit KV Cache Quantization System for Long-Context LLM Serving

Together AI open-sources OSCAR, an attention-aware 2-bit KV cache quantization system that significantly reduces memory usage and improves decoding speed for long-context LLMs.

May 2548

tech

Marked-up Mac minis flood eBay amid shortages driven by AI

This article explains how the surge in demand for Apple's Mac mini on eBay reflects the growing need for local AI inference hardware, demonstrating the intersection of AI model complexity, hardware scarcity, and market dynamics in the emerging AI ecosystem.

Apr 2459

NVIDIA and Google infrastructure cuts AI inference costs

Learn how Google and NVIDIA are making AI inference cheaper and faster through new hardware and software integration. This breakthrough could make AI more accessible to businesses and improve everyday applications.

Apr 2364

tech

Your old iPad or Android tablet can be your new smart home panel - here's how

Learn how advanced AI optimization techniques enable repurposing old tablets as smart home control panels through edge computing and model compression.

Apr 1785